Re: [-empyre-] Archives, metadata and searching

Luciana Duranti <luciana@interchange.ubc.ca> · Sun, 06 Feb 2005 13:27:31 -0800

Dear Simon and Stephen (and everybody else of course):
I would like to explain briefly the approach of InterPARES to metadata.
In the past couple of years, InterPARES has been building a database 
registering existing metadata schemata and analyzing them according to 
specific criteria aiming to establish whether a metadata schema is able to 
provide evidence that a record is as accurate, reliable and authentic as it 
was when first saved. Thus, our analyses are not concerned with the 
retrievability of records, other than indirectly, in the sense that, if a 
schema is able to satisfy our requirements, it will also be a powerful 
retrieval instrument.
Having looked at several schemata and at several case studies of different 
types of records creators (in the arts, sciences, and e-gov.) who are 
either using existing schemata or generating personalized ones, we have 
arrived at the conclusion that every schema is adequate to any purpose if 
it allows to identify the record in context and to establish its integrity. 
In other words, every digital entity should have identity metadata and 
integrity metadata. The former are the attributes that uniquely identify a 
record and distinguish it from any other record. For a letter, they would 
be attributes like names of creator (the person in whose archives the 
letter is maintained), author (human or organizational person issuing the 
letter), addressee, writer (the person articulating the discourse), date on 
the doc. date of transmission, of receipt and archiving, subject matter, 
filing code, filing codes of previous and subsequent letter, format, 
attachments, etc. For a telescope observation record, they would 
be  attributes like name of star, inclination of telescope, time of 
observation, light curve, etc. Every creator should identify what is needed 
for identification (and therefore retrieval) of its own records. Integrity 
metadata are data about responsibility for the record and for its changes 
over time. They include things like name of the person responsible for 
handling or for keeping the record, changes made to the record, dates and 
results of updates, upgrades, migrations, etc. The purpose is to 
demonstrate control on the maintenance process and justify changes. The 
reason is that, years later, one wants to be able to demonstrate that the 
entity copyrighted or linked to somebody's intellectual rights 10 years 
before is the same entity, even if it looks a bit different.
Now, all the metadata indicated above are the responsibility of the creator 
and chosen by the creator. Once the digital record goes to the preserver, 
it goes as part of an aggregation of material. The preserver should use 
metadata schemata representing the identifying attributes and the integrity 
information of the aggregation, not of its individual components. Linked to 
the metadata for the aggregation should be all the documentation related to 
that unit (name of creator, type and scope of material, historical 
development, how the material was originally used, circumstances of 
acquisition, internal relationships among its parts, technological 
characteristics, other related material, how the preserver has upgraded the 
material to maintain it accessible, consequent changes, etc.....we call 
this archival description.), and directions on how to retrieve things once 
one is inside the aggregation. Once the aggregation is retrieved by the 
user on the basis of the preserver metadata, than the original metadata 
schemata of the creator are used to get the specific record.
To make a long story short, we do not believe in one size fits all. We 
believe that metadata schemata should be built according to the same 
principles, but should be different from creator to creator unless the 
creators are doing the same things and producing the same records (which is 
usually true only in government and some types of businesses). We also 
believe that preservers should not be attaching metadata to records, but to 
the entire entity that they acquire as a unit, and should not be telling 
records creators what metadata to use, other than advising them in general 
on the principles that should guide their choice.
With all the above said, on Feb. 20 all InterPARES archival theorists will 
get together to sort out the metadata concept on the basis of findings to 
date, so everything may change...but not by much...I do not think.
Cheers,
Luciana